average 0
SupplementaryMaterialfor Geoclidean: Few-ShotGeneralization inEuclideanGeometry
Participants were recruited on Prolific [Palan and Schitter, 2018], and compensated withanhourlywageof $15.8. SeeFigure 1foranexample of a survey question given to participants,hostedonQualtrics. You may decline to answer any or all of the following questions. Your anonymity is assured; the researchers who have requested your participation will not receive any personalinformationaboutyou". A.4 FeatureVisualizations We provide visualizations of low-level features from ResNet50 and the Vision Transformer on a variety of rendered Geoclidean images. In contrast, the third Geoclideantask is difficult for both vision models, highlighting the intended difficulty of Euclidean geometric reasoning -why our task is especially interesting.
A Proofs
Lemma 1. Assume that Assumptions 1 and 2 hold, the iterations satisfy the following inequality for all k 2 N: Combine Assumption 2 with Definition 4.6, we have the second moment of g(W Summing both sides of this inequality for k 2{1,...,K} and recalling Assumption 2 (a) gives Rearranging above inequality and dividing further by K yields the result. The second condition in Eq. 4.10 ensures that lim Summing both sides of this inequality for k 2{1,...,K} and recalling Assumption 2(a) gives It guarantees that the model moves towards the descending direction of the loss function. Following the experimental setup in Section 5.1, we demonstrate that the proposed method empirically satisfies Assumption 2(b), and visualize in Figure 7 the sufficient direction constant µ for the (partial) convolutional layers of the four models during the end-to-end training using TREC. For SqueezeNet and ResNet-34, we show one block as the representative, since the other blocks have similar performance. Several insights can be drawn from Figure 7. (i) The value of µ of each convolutional layer is consistently greater than zero, indicating that Assumption 2(b) is satisfied, further ensuring the convergence of the TREC-equipped CNNs.
Soft decision trees for survival analysis
Consolo, Antonio, Amaldi, Edoardo, Carrizosa, Emilio
Decision trees are popular in survival analysis for their interpretability and ability to model complex relationships. Survival trees, which predict the timing of singular events using censored historical data, are typically built through heuristic approaches. Recently, there has been growing interest in globally optimized trees, where the overall tree is trained by minimizing the error function over all its parameters. We propose a new soft survival tree model (SST), with a soft splitting rule at each branch node, trained via a nonlinear optimization formulation amenable to decomposition. Since SSTs provide for every input vector a specific survival function associated to a single leaf node, they satisfy the conditional computation property and inherit the related benefits. SST and the training formulation combine flexibility with interpretability: any smooth survival function (parametric, semiparametric, or nonparametric) estimated through maximum likelihood can be used, and each leaf node of an SST yields a cluster of distinct survival functions which are associated to the data points routed to it. Numerical experiments on 15 well-known datasets show that SSTs, with parametric and spline-based semiparametric survival functions, trained using an adaptation of the node-based decomposition algorithm proposed by Consolo et al. (2024) for soft regression trees, outperform three benchmark survival trees in terms of four widely-used discrimination and calibration measures. SSTs can also be extended to consider group fairness.
- Europe > Italy > Lombardy > Milan (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Oceania > Australia (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Banking & Finance (1.00)
Revisiting Hallucination Detection with Effective Rank-based Uncertainty
Wang, Rui, Wei, Zeming, Yue, Guanzhang, Sun, Meng
Detecting hallucinations in large language models (LLMs) remains a fundamental challenge for their trustworthy deployment. Going beyond basic uncertainty-driven hallucination detection frameworks, we propose a simple yet powerful method that quantifies uncertainty by measuring the effective rank of hidden states derived from multiple model outputs and different layers. Grounded in the spectral analysis of representations, our approach provides interpretable insights into the model's internal reasoning process through semantic variations, while requiring no extra knowledge or additional modules, thus offering a combination of theoretical elegance and practical efficiency. Meanwhile, we theoretically demonstrate the necessity of quantifying uncertainty both internally (representations of a single response) and externally (different responses), providing a justification for using representations among different layers and responses from LLMs to detect hallucinations. Extensive experiments demonstrate that our method effectively detects hallucinations and generalizes robustly across various scenarios, contributing to a new paradigm of hallucination detection for LLM truthfulness.
Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection
Vykopal, Ivan, Karamolegkou, Antonia, Kopčan, Jaroslav, Peng, Qiwei, Javůrek, Tomáš, Gregor, Michal, Šimko, Marián
Multilingual Large Language Models (LLMs) offer powerful capabilities for cross-lingual fact-checking. However, these models often exhibit language bias, performing disproportionately better on high-resource languages such as English than on low-resource counterparts. We also present and inspect a novel concept - retrieval bias, when information retrieval systems tend to favor certain information over others, leaving the retrieval process skewed. In this paper, we study language and retrieval bias in the context of Previously Fact-Checked Claim Detection (PFCD). We evaluate six open-source multilingual LLMs across 20 languages using a fully multilingual prompting strategy, leveraging the AMC-16K dataset. By translating task prompts into each language, we uncover disparities in monolingual and cross-lingual performance and identify key trends based on model family, size, and prompting strategy. Our findings highlight persistent bias in LLM behavior and offer recommendations for improving equity in multilingual fact-checking. To investigate retrieval bias, we employed multilingual embedding models and look into the frequency of retrieved claims. Our analysis reveals that certain claims are retrieved disproportionately across different posts, leading to inflated retrieval performance for popular claims while under-representing less common ones.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Oceania > Australia (0.14)
- Europe > Austria > Vienna (0.14)
- (20 more...)
- Government (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.92)